Overview

Brought to you by YData

Dataset statistics

Number of variables19
Number of observations2964624
Missing cells700810
Missing cells (%)1.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory429.7 MiB
Average record size in memory152.0 B

Variable types

Categorical4
DateTime2
Numeric12
Boolean1

Alerts

Airport_fee is highly overall correlated with trip_distanceHigh correlation
fare_amount is highly overall correlated with total_amount and 1 other fieldsHigh correlation
improvement_surcharge is highly overall correlated with mta_taxHigh correlation
mta_tax is highly overall correlated with improvement_surchargeHigh correlation
store_and_fwd_flag is highly overall correlated with trip_distanceHigh correlation
tip_amount is highly overall correlated with total_amountHigh correlation
total_amount is highly overall correlated with fare_amount and 2 other fieldsHigh correlation
trip_distance is highly overall correlated with Airport_fee and 3 other fieldsHigh correlation
store_and_fwd_flag is highly imbalanced (96.2%)Imbalance
payment_type is highly imbalanced (55.4%)Imbalance
improvement_surcharge is highly imbalanced (95.7%)Imbalance
Airport_fee is highly imbalanced (72.9%)Imbalance
passenger_count has 140162 (4.7%) missing valuesMissing
RatecodeID has 140162 (4.7%) missing valuesMissing
store_and_fwd_flag has 140162 (4.7%) missing valuesMissing
congestion_surcharge has 140162 (4.7%) missing valuesMissing
Airport_fee has 140162 (4.7%) missing valuesMissing
trip_distance is highly skewed (γ1 = 1001.887885)Skewed
passenger_count has 31465 (1.1%) zerosZeros
trip_distance has 60371 (2.0%) zerosZeros
extra has 1290548 (43.5%) zerosZeros
mta_tax has 29707 (1.0%) zerosZeros
tip_amount has 710292 (24.0%) zerosZeros
tolls_amount has 2753809 (92.9%) zerosZeros
congestion_surcharge has 217877 (7.3%) zerosZeros

Reproduction

Analysis started2024-08-02 13:52:53.429004
Analysis finished2024-08-02 13:54:15.343148
Duration1 minute and 21.91 seconds
Software versionydata-profiling vv4.9.0
Download configurationconfig.json

Variables

VendorID
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.6 MiB
2
2234632 
1
729732 
6
 
260

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2964624
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 2234632
75.4%
1 729732
 
24.6%
6 260
 
< 0.1%

Length

2024-08-02T15:54:15.389942image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-08-02T15:54:15.447512image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
2 2234632
75.4%
1 729732
 
24.6%
6 260
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
2 2234632
75.4%
1 729732
 
24.6%
6 260
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2964624
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 2234632
75.4%
1 729732
 
24.6%
6 260
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2964624
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 2234632
75.4%
1 729732
 
24.6%
6 260
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2964624
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 2234632
75.4%
1 729732
 
24.6%
6 260
 
< 0.1%
Distinct1575706
Distinct (%)53.2%
Missing0
Missing (%)0.0%
Memory size22.6 MiB
Minimum2002-12-31 22:59:39
Maximum2024-02-01 00:01:15
2024-08-02T15:54:15.501565image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:15.563721image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1574780
Distinct (%)53.1%
Missing0
Missing (%)0.0%
Memory size22.6 MiB
Minimum2002-12-31 23:05:41
Maximum2024-02-02 13:56:52
2024-08-02T15:54:15.623817image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:15.684648image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

passenger_count
Real number (ℝ)

MISSING  ZEROS 

Distinct10
Distinct (%)< 0.1%
Missing140162
Missing (%)4.7%
Infinite0
Infinite (%)0.0%
Mean1.3392809
Minimum0
Maximum9
Zeros31465
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size22.6 MiB
2024-08-02T15:54:15.740213image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum9
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.85028169
Coefficient of variation (CV)0.63487928
Kurtosis10.671029
Mean1.3392809
Median Absolute Deviation (MAD)0
Skewness3.0389422
Sum3782748
Variance0.72297896
MonotonicityNot monotonic
2024-08-02T15:54:15.782683image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 2188739
73.8%
2 405103
 
13.7%
3 91262
 
3.1%
4 51974
 
1.8%
5 33506
 
1.1%
0 31465
 
1.1%
6 22353
 
0.8%
8 51
 
< 0.1%
7 8
 
< 0.1%
9 1
 
< 0.1%
(Missing) 140162
 
4.7%
ValueCountFrequency (%)
0 31465
 
1.1%
1 2188739
73.8%
2 405103
 
13.7%
3 91262
 
3.1%
4 51974
 
1.8%
5 33506
 
1.1%
6 22353
 
0.8%
7 8
 
< 0.1%
8 51
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
9 1
 
< 0.1%
8 51
 
< 0.1%
7 8
 
< 0.1%
6 22353
 
0.8%
5 33506
 
1.1%
4 51974
 
1.8%
3 91262
 
3.1%
2 405103
 
13.7%
1 2188739
73.8%
0 31465
 
1.1%

trip_distance
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct4489
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.6521692
Minimum0
Maximum312722.3
Zeros60371
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size22.6 MiB
2024-08-02T15:54:15.833477image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.43
Q11
median1.68
Q33.11
95-th percentile13.69
Maximum312722.3
Range312722.3
Interquartile range (IQR)2.11

Descriptive statistics

Standard deviation225.46257
Coefficient of variation (CV)61.73388
Kurtosis1281274.3
Mean3.6521692
Median Absolute Deviation (MAD)0.86
Skewness1001.8879
Sum10827308
Variance50833.372
MonotonicityNot monotonic
2024-08-02T15:54:15.893726image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 60371
 
2.0%
0.9 40455
 
1.4%
1 40192
 
1.4%
0.8 39964
 
1.3%
1.1 38662
 
1.3%
0.7 37603
 
1.3%
1.2 36917
 
1.2%
1.3 35131
 
1.2%
1.4 33111
 
1.1%
0.6 32791
 
1.1%
Other values (4479) 2569427
86.7%
ValueCountFrequency (%)
0 60371
2.0%
0.01 2396
 
0.1%
0.02 1652
 
0.1%
0.03 1270
 
< 0.1%
0.04 1012
 
< 0.1%
0.05 766
 
< 0.1%
0.06 662
 
< 0.1%
0.07 576
 
< 0.1%
0.08 575
 
< 0.1%
0.09 459
 
< 0.1%
ValueCountFrequency (%)
312722.3 1
< 0.1%
97793.92 1
< 0.1%
82015.45 1
< 0.1%
72975.97 1
< 0.1%
71752.26 1
< 0.1%
59282.45 1
< 0.1%
59076.43 1
< 0.1%
58298.51 1
< 0.1%
51619.36 1
< 0.1%
44018.64 1
< 0.1%

RatecodeID
Real number (ℝ)

MISSING 

Distinct7
Distinct (%)< 0.1%
Missing140162
Missing (%)4.7%
Infinite0
Infinite (%)0.0%
Mean2.0693594
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.6 MiB
2024-08-02T15:54:15.944645image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum99
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation9.823219
Coefficient of variation (CV)4.7469854
Kurtosis93.209258
Mean2.0693594
Median Absolute Deviation (MAD)0
Skewness9.7490649
Sum5844827
Variance96.495631
MonotonicityNot monotonic
2024-08-02T15:54:15.988563image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 2663350
89.8%
2 98713
 
3.3%
99 28663
 
1.0%
5 19410
 
0.7%
3 7954
 
0.3%
4 6365
 
0.2%
6 7
 
< 0.1%
(Missing) 140162
 
4.7%
ValueCountFrequency (%)
1 2663350
89.8%
2 98713
 
3.3%
3 7954
 
0.3%
4 6365
 
0.2%
5 19410
 
0.7%
6 7
 
< 0.1%
99 28663
 
1.0%
ValueCountFrequency (%)
99 28663
 
1.0%
6 7
 
< 0.1%
5 19410
 
0.7%
4 6365
 
0.2%
3 7954
 
0.3%
2 98713
 
3.3%
1 2663350
89.8%

store_and_fwd_flag
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing140162
Missing (%)4.7%
Memory size5.7 MiB
False
2813126 
True
 
11336
(Missing)
 
140162
ValueCountFrequency (%)
False 2813126
94.9%
True 11336
 
0.4%
(Missing) 140162
 
4.7%
2024-08-02T15:54:16.029894image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

PULocationID
Real number (ℝ)

Distinct260
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean166.01788
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.6 MiB
2024-08-02T15:54:16.077360image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile48
Q1132
median162
Q3234
95-th percentile249
Maximum265
Range264
Interquartile range (IQR)102

Descriptive statistics

Standard deviation63.623914
Coefficient of variation (CV)0.38323531
Kurtosis-0.82977668
Mean166.01788
Median Absolute Deviation (MAD)62
Skewness-0.27225227
Sum4.921806 × 108
Variance4048.0025
MonotonicityNot monotonic
2024-08-02T15:54:16.136603image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
132 145240
 
4.9%
161 143471
 
4.8%
237 142708
 
4.8%
236 136465
 
4.6%
162 106717
 
3.6%
230 106324
 
3.6%
186 104523
 
3.5%
142 104080
 
3.5%
138 89533
 
3.0%
239 88474
 
3.0%
Other values (250) 1797089
60.6%
ValueCountFrequency (%)
1 295
 
< 0.1%
2 3
 
< 0.1%
3 105
 
< 0.1%
4 3568
0.1%
6 21
 
< 0.1%
7 1811
0.1%
8 11
 
< 0.1%
9 57
 
< 0.1%
10 999
 
< 0.1%
11 58
 
< 0.1%
ValueCountFrequency (%)
265 1658
 
0.1%
264 10360
 
0.3%
263 59797
2.0%
262 42801
1.4%
261 12893
 
0.4%
260 813
 
< 0.1%
259 119
 
< 0.1%
258 185
 
< 0.1%
257 78
 
< 0.1%
256 912
 
< 0.1%

DOLocationID
Real number (ℝ)

Distinct261
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean165.11671
Minimum1
Maximum265
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.6 MiB
2024-08-02T15:54:16.194834image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile43
Q1114
median162
Q3234
95-th percentile261
Maximum265
Range264
Interquartile range (IQR)120

Descriptive statistics

Standard deviation69.31535
Coefficient of variation (CV)0.41979609
Kurtosis-0.90603826
Mean165.11671
Median Absolute Deviation (MAD)68
Skewness-0.37551746
Sum4.8950897 × 108
Variance4804.6177
MonotonicityNot monotonic
2024-08-02T15:54:16.254105image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
236 142044
 
4.8%
237 130249
 
4.4%
161 111942
 
3.8%
230 90603
 
3.1%
142 89673
 
3.0%
239 89105
 
3.0%
170 86733
 
2.9%
162 85238
 
2.9%
141 83562
 
2.8%
68 74517
 
2.5%
Other values (251) 1980958
66.8%
ValueCountFrequency (%)
1 7176
0.2%
2 4
 
< 0.1%
3 247
 
< 0.1%
4 11536
0.4%
5 9
 
< 0.1%
6 62
 
< 0.1%
7 7738
0.3%
8 45
 
< 0.1%
9 284
 
< 0.1%
10 2665
 
0.1%
ValueCountFrequency (%)
265 11967
 
0.4%
264 16116
 
0.5%
263 64989
2.2%
262 48328
1.6%
261 12617
 
0.4%
260 2200
 
0.1%
259 349
 
< 0.1%
258 732
 
< 0.1%
257 1096
 
< 0.1%
256 5465
 
0.2%

payment_type
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.6 MiB
1
2319046 
2
439191 
0
 
140162
4
 
46628
3
 
19597

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2964624
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 2319046
78.2%
2 439191
 
14.8%
0 140162
 
4.7%
4 46628
 
1.6%
3 19597
 
0.7%

Length

2024-08-02T15:54:16.307937image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-08-02T15:54:16.351995image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
1 2319046
78.2%
2 439191
 
14.8%
0 140162
 
4.7%
4 46628
 
1.6%
3 19597
 
0.7%

Most occurring characters

ValueCountFrequency (%)
1 2319046
78.2%
2 439191
 
14.8%
0 140162
 
4.7%
4 46628
 
1.6%
3 19597
 
0.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2964624
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 2319046
78.2%
2 439191
 
14.8%
0 140162
 
4.7%
4 46628
 
1.6%
3 19597
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2964624
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 2319046
78.2%
2 439191
 
14.8%
0 140162
 
4.7%
4 46628
 
1.6%
3 19597
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2964624
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 2319046
78.2%
2 439191
 
14.8%
0 140162
 
4.7%
4 46628
 
1.6%
3 19597
 
0.7%

fare_amount
Real number (ℝ)

HIGH CORRELATION 

Distinct8970
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.175062
Minimum-899
Maximum5000
Zeros893
Zeros (%)< 0.1%
Negative37448
Negative (%)1.3%
Memory size22.6 MiB
2024-08-02T15:54:16.403071image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum-899
5-th percentile5.1
Q18.6
median12.8
Q320.5
95-th percentile61.8
Maximum5000
Range5899
Interquartile range (IQR)11.9

Descriptive statistics

Standard deviation18.949548
Coefficient of variation (CV)1.0426126
Kurtosis3653.4671
Mean18.175062
Median Absolute Deviation (MAD)4.9
Skewness18.150372
Sum53882225
Variance359.08536
MonotonicityNot monotonic
2024-08-02T15:54:16.461205image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.6 140879
 
4.8%
7.9 139456
 
4.7%
9.3 138462
 
4.7%
10 135501
 
4.6%
7.2 133066
 
4.5%
10.7 127631
 
4.3%
11.4 120337
 
4.1%
6.5 118249
 
4.0%
12.1 112320
 
3.8%
12.8 103324
 
3.5%
Other values (8960) 1695399
57.2%
ValueCountFrequency (%)
-899 1
< 0.1%
-800 2
< 0.1%
-744.3 1
< 0.1%
-709 1
< 0.1%
-700 1
< 0.1%
-670 1
< 0.1%
-669.4 1
< 0.1%
-650 1
< 0.1%
-607.8 1
< 0.1%
-600 1
< 0.1%
ValueCountFrequency (%)
5000 2
< 0.1%
2500 3
< 0.1%
2221.3 1
 
< 0.1%
1616.5 1
 
< 0.1%
1000 1
 
< 0.1%
912.3 1
 
< 0.1%
899 1
 
< 0.1%
820 1
 
< 0.1%
800 2
< 0.1%
761.1 1
 
< 0.1%

extra
Real number (ℝ)

ZEROS 

Distinct48
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.4515984
Minimum-7.5
Maximum14.25
Zeros1290548
Zeros (%)43.5%
Negative17548
Negative (%)0.6%
Memory size22.6 MiB
2024-08-02T15:54:16.521138image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum-7.5
5-th percentile0
Q10
median1
Q32.5
95-th percentile5
Maximum14.25
Range21.75
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation1.8041025
Coefficient of variation (CV)1.2428385
Kurtosis2.7855932
Mean1.4515984
Median Absolute Deviation (MAD)1
Skewness1.3976617
Sum4303443.5
Variance3.2547857
MonotonicityNot monotonic
2024-08-02T15:54:16.577456image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
0 1290548
43.5%
2.5 705767
23.8%
1 526527
17.8%
5 192426
 
6.5%
3.5 143201
 
4.8%
6 23477
 
0.8%
7.5 22407
 
0.8%
9.25 10506
 
0.4%
-1 10287
 
0.3%
4.25 9767
 
0.3%
Other values (38) 29711
 
1.0%
ValueCountFrequency (%)
-7.5 227
 
< 0.1%
-6 319
 
< 0.1%
-5 1146
 
< 0.1%
-3.5 1
 
< 0.1%
-2.5 5564
 
0.2%
-1.5 3
 
< 0.1%
-1 10287
 
0.3%
-0.04 1
 
< 0.1%
0 1290548
43.5%
0.01 2
 
< 0.1%
ValueCountFrequency (%)
14.25 2
 
< 0.1%
12.5 1
 
< 0.1%
11.75 2440
 
0.1%
10.25 2911
 
0.1%
10 642
 
< 0.1%
9.95 1
 
< 0.1%
9.25 10506
0.4%
8.5 394
 
< 0.1%
8.2 2
 
< 0.1%
7.75 2373
 
0.1%

mta_tax
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.48338231
Minimum-0.5
Maximum4
Zeros29707
Zeros (%)1.0%
Negative34434
Negative (%)1.2%
Memory size22.6 MiB
2024-08-02T15:54:16.625836image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum-0.5
5-th percentile0.5
Q10.5
median0.5
Q30.5
95-th percentile0.5
Maximum4
Range4.5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.11776003
Coefficient of variation (CV)0.24361676
Kurtosis57.743719
Mean0.48338231
Median Absolute Deviation (MAD)0
Skewness-7.4054623
Sum1433046.8
Variance0.013867425
MonotonicityNot monotonic
2024-08-02T15:54:16.668083image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0.5 2900474
97.8%
-0.5 34434
 
1.2%
0 29707
 
1.0%
4 5
 
< 0.1%
1.6 1
 
< 0.1%
0.8 1
 
< 0.1%
1.4 1
 
< 0.1%
3 1
 
< 0.1%
ValueCountFrequency (%)
-0.5 34434
 
1.2%
0 29707
 
1.0%
0.5 2900474
97.8%
0.8 1
 
< 0.1%
1.4 1
 
< 0.1%
1.6 1
 
< 0.1%
3 1
 
< 0.1%
4 5
 
< 0.1%
ValueCountFrequency (%)
4 5
 
< 0.1%
3 1
 
< 0.1%
1.6 1
 
< 0.1%
1.4 1
 
< 0.1%
0.8 1
 
< 0.1%
0.5 2900474
97.8%
0 29707
 
1.0%
-0.5 34434
 
1.2%

tip_amount
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct4192
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.33587
Minimum-80
Maximum428
Zeros710292
Zeros (%)24.0%
Negative102
Negative (%)< 0.1%
Memory size22.6 MiB
2024-08-02T15:54:16.720824image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum-80
5-th percentile0
Q11
median2.7
Q34.12
95-th percentile11.2
Maximum428
Range508
Interquartile range (IQR)3.12

Descriptive statistics

Standard deviation3.8965506
Coefficient of variation (CV)1.1680763
Kurtosis173.6392
Mean3.33587
Median Absolute Deviation (MAD)1.7
Skewness5.0541375
Sum9889600.3
Variance15.183107
MonotonicityNot monotonic
2024-08-02T15:54:16.779946image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 710292
24.0%
2 145946
 
4.9%
1 113565
 
3.8%
3 75150
 
2.5%
5 39511
 
1.3%
2.8 39085
 
1.3%
3.5 33831
 
1.1%
2.1 32854
 
1.1%
4 31961
 
1.1%
1.5 31215
 
1.1%
Other values (4182) 1711214
57.7%
ValueCountFrequency (%)
-80 1
 
< 0.1%
-66.02 1
 
< 0.1%
-65.1 1
 
< 0.1%
-52 1
 
< 0.1%
-37.58 1
 
< 0.1%
-33 1
 
< 0.1%
-22.24 1
 
< 0.1%
-22 2
< 0.1%
-17.59 1
 
< 0.1%
-16.19 3
< 0.1%
ValueCountFrequency (%)
428 1
< 0.1%
422.7 1
< 0.1%
303 1
< 0.1%
300 1
< 0.1%
280 1
< 0.1%
250 1
< 0.1%
220.88 1
< 0.1%
202 2
< 0.1%
175.17 1
< 0.1%
150 1
< 0.1%

tolls_amount
Real number (ℝ)

ZEROS 

Distinct1127
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5270212
Minimum-80
Maximum115.92
Zeros2753809
Zeros (%)92.9%
Negative2035
Negative (%)0.1%
Memory size22.6 MiB
2024-08-02T15:54:16.836475image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum-80
5-th percentile0
Q10
median0
Q30
95-th percentile6.94
Maximum115.92
Range195.92
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.1283097
Coefficient of variation (CV)4.0383758
Kurtosis72.868008
Mean0.5270212
Median Absolute Deviation (MAD)0
Skewness5.4859052
Sum1562419.7
Variance4.5297021
MonotonicityNot monotonic
2024-08-02T15:54:16.897964image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2753809
92.9%
6.94 191910
 
6.5%
13.38 2031
 
0.1%
-6.94 1685
 
0.1%
3.18 1417
 
< 0.1%
15.38 1378
 
< 0.1%
13.88 1144
 
< 0.1%
12.75 891
 
< 0.1%
14.75 574
 
< 0.1%
20.32 360
 
< 0.1%
Other values (1117) 9425
 
0.3%
ValueCountFrequency (%)
-80 1
< 0.1%
-60 1
< 0.1%
-56.64 1
< 0.1%
-55.34 1
< 0.1%
-54.02 1
< 0.1%
-52.57 1
< 0.1%
-50 2
< 0.1%
-49.26 1
< 0.1%
-48.75 1
< 0.1%
-47.26 1
< 0.1%
ValueCountFrequency (%)
115.92 1
 
< 0.1%
101.69 1
 
< 0.1%
99 1
 
< 0.1%
95.46 1
 
< 0.1%
90 1
 
< 0.1%
87 1
 
< 0.1%
85 2
 
< 0.1%
83 2
 
< 0.1%
82 1
 
< 0.1%
81 6
< 0.1%

improvement_surcharge
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.6 MiB
1.0
2927710 
-1.0
 
35500
0.0
 
838
0.3
 
574
-0.3
 
2

Length

Max length4
Median length3
Mean length3.0119752
Min length3

Characters and Unicode

Total characters8929374
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 2927710
98.8%
-1.0 35500
 
1.2%
0.0 838
 
< 0.1%
0.3 574
 
< 0.1%
-0.3 2
 
< 0.1%

Length

2024-08-02T15:54:16.957563image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-08-02T15:54:17.002143image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0 2963210
> 99.9%
0.0 838
 
< 0.1%
0.3 576
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2965462
33.2%
. 2964624
33.2%
1 2963210
33.2%
- 35502
 
0.4%
3 576
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 8929374
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2965462
33.2%
. 2964624
33.2%
1 2963210
33.2%
- 35502
 
0.4%
3 576
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 8929374
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2965462
33.2%
. 2964624
33.2%
1 2963210
33.2%
- 35502
 
0.4%
3 576
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 8929374
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2965462
33.2%
. 2964624
33.2%
1 2963210
33.2%
- 35502
 
0.4%
3 576
 
< 0.1%

total_amount
Real number (ℝ)

HIGH CORRELATION 

Distinct19241
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.801505
Minimum-900
Maximum5000
Zeros416
Zeros (%)< 0.1%
Negative35504
Negative (%)1.2%
Memory size22.6 MiB
2024-08-02T15:54:17.051800image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum-900
5-th percentile10.87
Q115.38
median20.1
Q328.56
95-th percentile80.19
Maximum5000
Range5900
Interquartile range (IQR)13.18

Descriptive statistics

Standard deviation23.385577
Coefficient of variation (CV)0.87254718
Kurtosis1570.4795
Mean26.801505
Median Absolute Deviation (MAD)5.8
Skewness10.68236
Sum79456384
Variance546.88523
MonotonicityNot monotonic
2024-08-02T15:54:17.109727image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16.8 45432
 
1.5%
12.6 43275
 
1.5%
21 36556
 
1.2%
15.12 26687
 
0.9%
15.96 26396
 
0.9%
14.28 25970
 
0.9%
17.64 24525
 
0.8%
18.48 24349
 
0.8%
13.44 23938
 
0.8%
19.32 23514
 
0.8%
Other values (19231) 2663982
89.9%
ValueCountFrequency (%)
-900 1
< 0.1%
-801 2
< 0.1%
-753.74 1
< 0.1%
-710 1
< 0.1%
-695.75 1
< 0.1%
-671 1
< 0.1%
-652.75 1
< 0.1%
-637.87 1
< 0.1%
-591 1
< 0.1%
-578.96 1
< 0.1%
ValueCountFrequency (%)
5000 2
< 0.1%
2500 3
< 0.1%
2225.3 1
 
< 0.1%
1617.5 1
 
< 0.1%
1000 1
 
< 0.1%
940.93 1
 
< 0.1%
900 1
 
< 0.1%
821 1
 
< 0.1%
801 2
< 0.1%
775.48 1
 
< 0.1%

congestion_surcharge
Real number (ℝ)

MISSING  ZEROS 

Distinct6
Distinct (%)< 0.1%
Missing140162
Missing (%)4.7%
Infinite0
Infinite (%)0.0%
Mean2.2561221
Minimum-2.5
Maximum2.5
Zeros217877
Zeros (%)7.3%
Negative28825
Negative (%)1.0%
Memory size22.6 MiB
2024-08-02T15:54:17.157589image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Quantile statistics

Minimum-2.5
5-th percentile0
Q12.5
median2.5
Q32.5
95-th percentile2.5
Maximum2.5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.82327467
Coefficient of variation (CV)0.36490697
Kurtosis12.724867
Mean2.2561221
Median Absolute Deviation (MAD)0
Skewness-3.5314914
Sum6372331
Variance0.67778118
MonotonicityNot monotonic
2024-08-02T15:54:17.198533image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2.5 2577755
87.0%
0 217877
 
7.3%
-2.5 28824
 
1.0%
0.75 3
 
< 0.1%
1 2
 
< 0.1%
-0.75 1
 
< 0.1%
(Missing) 140162
 
4.7%
ValueCountFrequency (%)
-2.5 28824
 
1.0%
-0.75 1
 
< 0.1%
0 217877
 
7.3%
0.75 3
 
< 0.1%
1 2
 
< 0.1%
2.5 2577755
87.0%
ValueCountFrequency (%)
2.5 2577755
87.0%
1 2
 
< 0.1%
0.75 3
 
< 0.1%
0 217877
 
7.3%
-0.75 1
 
< 0.1%
-2.5 28824
 
1.0%

Airport_fee
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing140162
Missing (%)4.7%
Memory size22.6 MiB
0.0
2586789 
1.75
 
232752
-1.75
 
4921

Length

Max length5
Median length3
Mean length3.0858903
Min length3

Characters and Unicode

Total characters8715980
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0 2586789
87.3%
1.75 232752
 
7.9%
-1.75 4921
 
0.2%
(Missing) 140162
 
4.7%

Length

2024-08-02T15:54:17.252153image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-08-02T15:54:17.298012image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0 2586789
91.6%
1.75 237673
 
8.4%

Most occurring characters

ValueCountFrequency (%)
0 5173578
59.4%
. 2824462
32.4%
1 237673
 
2.7%
7 237673
 
2.7%
5 237673
 
2.7%
- 4921
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 8715980
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 5173578
59.4%
. 2824462
32.4%
1 237673
 
2.7%
7 237673
 
2.7%
5 237673
 
2.7%
- 4921
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 8715980
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 5173578
59.4%
. 2824462
32.4%
1 237673
 
2.7%
7 237673
 
2.7%
5 237673
 
2.7%
- 4921
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 8715980
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 5173578
59.4%
. 2824462
32.4%
1 237673
 
2.7%
7 237673
 
2.7%
5 237673
 
2.7%
- 4921
 
0.1%

Interactions

2024-08-02T15:54:05.610794image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:41.746950image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:44.046672image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:46.111336image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:48.238186image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:50.527419image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:52.791134image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:54.929639image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:56.992312image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:59.077519image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:01.212112image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:03.531543image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:05.772116image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:42.034559image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:44.218777image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:46.279461image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:48.424651image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:50.713870image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:52.959074image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:55.100930image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:57.156465image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:59.258737image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:01.386869image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:03.701477image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:05.944390image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:42.217201image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:44.391547image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:46.456386image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:48.623207image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:50.911762image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:53.135531image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:55.274057image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:57.325281image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:59.442635image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:01.729727image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:03.874732image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:06.118478image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:42.408470image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:44.560505image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:46.640243image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:48.806189image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:51.113260image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:53.358453image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:55.453352image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:57.512505image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:59.626099image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:01.978131image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:04.065745image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:06.296045image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:42.598159image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:44.726444image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:46.823253image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:49.006054image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:51.295504image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:53.551330image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:55.630498image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:57.696708image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:59.806842image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:02.168901image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:04.258161image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:06.460896image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:42.784744image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:44.895828image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:46.994875image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:49.193371image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:51.483017image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:53.732012image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:55.797903image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:57.880911image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:59.971931image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:02.339991image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:04.426273image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:06.628179image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:42.963415image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:45.063039image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:47.169185image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:49.378352image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:51.669382image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:53.912697image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:55.963851image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:58.053436image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:00.155480image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:02.517140image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:04.605895image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:06.791760image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:43.149128image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:45.227344image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:47.338838image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:49.563871image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:51.855060image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:54.074680image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:56.130644image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:58.208457image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:00.316266image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:02.683110image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:04.769122image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:06.955698image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:43.332199image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:45.428991image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:47.508926image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:49.750304image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:52.044326image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:54.236629image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:56.294122image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:58.377415image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:00.468439image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:02.848318image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:04.929978image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:07.123433image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:43.510648image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:45.593749image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:47.686508image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:49.932979image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:52.226878image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:54.407873image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:56.467154image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:58.543473image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:00.658388image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:03.010274image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:05.102983image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:07.292452image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:43.698506image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:45.763814image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:47.862935image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:50.125152image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:52.419194image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:54.582330image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:56.636249image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:58.734530image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:00.855716image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:03.181497image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:05.266610image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:07.456525image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:43.876577image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:45.937567image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:48.037237image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:50.324369image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:52.615459image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:54.758916image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:56.815152image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:53:58.908075image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:01.043456image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:03.360806image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
2024-08-02T15:54:05.442832image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/

Correlations

2024-08-02T15:54:17.338017image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Airport_feeDOLocationIDPULocationIDRatecodeIDVendorIDcongestion_surchargeextrafare_amountimprovement_surchargemta_taxpassenger_countpayment_typestore_and_fwd_flagtip_amounttolls_amounttotal_amounttrip_distance
Airport_fee1.0000.0710.3760.0310.0530.3150.4800.0270.2630.2550.0300.1460.0050.0560.2680.0321.000
DOLocationID0.0711.0000.083-0.0530.0090.1120.001-0.1010.0120.024-0.0070.0320.005-0.006-0.053-0.088-0.097
PULocationID0.3760.0831.000-0.1350.0290.167-0.038-0.1490.0150.020-0.0160.0420.004-0.040-0.138-0.140-0.148
RatecodeID0.031-0.053-0.1351.0000.180-0.289-0.1170.3520.017-0.2640.0490.0470.0060.0880.4780.3340.264
VendorID0.0530.0090.0290.1801.0000.0730.4480.0020.4760.0440.2020.0620.0960.0050.0110.0030.000
congestion_surcharge0.3150.1120.167-0.2890.0731.0000.093-0.1350.4520.4430.0110.3050.0060.110-0.089-0.094-0.159
extra0.4800.001-0.038-0.1170.4480.0931.0000.0630.3510.155-0.0430.2200.0680.1730.1530.1670.101
fare_amount0.027-0.101-0.1490.3520.002-0.1350.0631.0000.0390.0590.0430.0100.0000.4290.4160.9640.865
improvement_surcharge0.2630.0120.0150.0170.4760.4520.3510.0391.0000.5010.0140.2840.0310.0070.1190.0400.021
mta_tax0.2550.0240.020-0.2640.0440.4430.1550.0590.5011.000-0.0250.2800.0080.079-0.0510.0630.026
passenger_count0.030-0.007-0.0160.0490.2020.011-0.0430.0430.014-0.0251.0000.0340.0330.0090.0440.0410.036
payment_type0.1460.0320.0420.0470.0620.3050.2200.0100.2840.2800.0341.0000.0090.0140.0670.0100.005
store_and_fwd_flag0.0050.0050.0040.0060.0960.0060.0680.0000.0310.0080.0330.0091.0000.0020.0030.0001.000
tip_amount0.056-0.006-0.0400.0880.0050.1100.1730.4290.0070.0790.0090.0140.0021.0000.2550.5750.411
tolls_amount0.268-0.053-0.1380.4780.011-0.0890.1530.4160.119-0.0510.0440.0670.0030.2551.0000.4270.397
total_amount0.032-0.088-0.1400.3340.003-0.0940.1670.9640.0400.0630.0410.0100.0000.5750.4271.0000.845
trip_distance1.000-0.097-0.1480.2640.000-0.1590.1010.8650.0210.0260.0360.0051.0000.4110.3970.8451.000

Missing values

2024-08-02T15:54:07.601705image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
A simple visualization of nullity by column.
2024-08-02T15:54:09.198903image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-08-02T15:54:14.146435image/svg+xmlMatplotlib v3.9.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeAirport_fee
022024-01-01 00:57:552024-01-01 01:17:431.01.721.0N18679217.71.00.50.000.01.022.702.50.00
112024-01-01 00:03:002024-01-01 00:09:361.01.801.0N140236110.03.50.53.750.01.018.752.50.00
212024-01-01 00:17:062024-01-01 00:35:011.04.701.0N23679123.33.50.53.000.01.031.302.50.00
312024-01-01 00:36:382024-01-01 00:44:561.01.401.0N79211110.03.50.52.000.01.017.002.50.00
412024-01-01 00:46:512024-01-01 00:52:571.00.801.0N21114817.93.50.53.200.01.016.102.50.00
512024-01-01 00:54:082024-01-01 01:26:311.04.701.0N148141129.63.50.56.900.01.041.502.50.00
622024-01-01 00:49:442024-01-01 01:15:472.010.821.0N138181145.76.00.510.000.01.064.950.01.75
712024-01-01 00:30:402024-01-01 00:58:400.03.001.0N246231225.43.50.50.000.01.030.402.50.00
822024-01-01 00:26:012024-01-01 00:54:121.05.441.0N161261231.01.00.50.000.01.036.002.50.00
922024-01-01 00:28:082024-01-01 00:29:161.00.041.0N11311323.01.00.50.000.01.08.002.50.00
VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surchargeAirport_fee
296461422024-01-31 23:23:022024-01-31 23:37:53NaN2.50NaNNone50137017.090.000.53.160.001.024.25NaNNaN
296461512024-01-31 23:54:302024-02-01 00:01:05NaN1.20NaNNone11423108.601.000.52.720.001.016.32NaNNaN
296461622024-01-31 23:40:022024-01-31 23:47:45NaN1.89NaNNone4868012.540.000.53.310.001.019.85NaNNaN
296461722024-01-31 23:27:002024-01-31 23:43:00NaN8.99NaNNone50127035.240.000.57.850.001.047.09NaNNaN
296461812024-01-31 23:18:482024-01-31 23:38:05NaN3.90NaNNone90238021.901.000.54.030.001.030.93NaNNaN
296461922024-01-31 23:45:592024-01-31 23:54:36NaN3.18NaNNone107263015.770.000.52.000.001.021.77NaNNaN
296462012024-01-31 23:13:072024-01-31 23:27:52NaN4.00NaNNone114236018.401.000.52.340.001.025.74NaNNaN
296462122024-01-31 23:19:002024-01-31 23:38:00NaN3.33NaNNone21125019.970.000.50.000.001.023.97NaNNaN
296462222024-01-31 23:07:232024-01-31 23:25:14NaN3.06NaNNone10713023.880.000.55.580.001.033.46NaNNaN
296462312024-01-31 23:58:252024-02-01 00:13:30NaN8.10NaNNone13875032.407.750.57.296.941.055.88NaNNaN